Using spectro-temporal features to improve AFE feature extraction for ASR
نویسندگان
چکیده
Previous work has shown that spectro-temporal features reduce WER for automatic speech recognition under noisy conditions. The spectro-temporal framework, however, is not the only way to process features in order to reduce errors due to noise in the signal. The two-stage mel-warped Wiener filtering method used in the “Advanced Front End” (AFE), now a standard front end for robust recognition, is another way. Since the spectrotemporal approach can be applied to a noise-reduced spectrum, we wanted to explore whether spectro-temporal features could improve the performance of AFE for ASR. We show that computing spectro-temporal features after AFE processing results in a 45% relative improvement compared to AFE in clean conditions and a 6% to 30% improvement in noisy conditions on the Aurora2 clean training setup.
منابع مشابه
Normalization of spectro-temporal Gabor filter bank features for improved robust automatic speech recognition systems
Physiologically motivated feature extraction methods based on 2D-Gabor filters have already been used successfully in robust automatic speech recognition (ASR) systems. Recently it was shown that a Mel Frequency Cepstral Coefficients (MFCC) baseline can be improved with physiologically motivated features extracted by a 2D-Gabor filter bank (GBFB). Besides physiologically inspired approaches to ...
متن کاملPhoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملSpectro-temporal Gabor features as a front end for automatic speech recognition
A novel type of feature extraction is introduced to be used as a front end for automatic speech recognition (ASR). Two-dimensional Gabor filter functions are applied to a spectro-temporal representation formed by columns of primary feature vectors. The filter shape is motivated by recent findings in neurophysiology and psychoacoustics which revealed sensitivity towards complex spectro-temporal ...
متن کاملHuman and automatic speech recognition in the presence of speech-intrinsic variations
Despite several decades of research, automatic speech recognition (ASR) lacks the performance achieved by human listeners. One of the major challenges in ASR is to cope with the immense variability of spoken language, which can be categorized into extrinsic sources (e.g., additive noise) and intrinsic factors (such as speaking rate, style, effort, dialect, and accent). What can we learn from th...
متن کاملSpectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition.
In an attempt to increase the robustness of automatic speech recognition (ASR) systems, a feature extraction scheme is proposed that takes spectro-temporal modulation frequencies (MF) into account. This physiologically inspired approach uses a two-dimensional filter bank based on Gabor filters, which limits the redundant information between feature components, and also results in physically int...
متن کامل